33 research outputs found

    Analysis of Scalable Algorithms for Dynamic Load Balancing and Mapping with Application to Photo-realistic Rendering

    Get PDF
    This thesis presents and analyzes scalable algorithms for dynamic load balancing and mapping in distributed computer systems. The algorithms are distributed and concurrent, have no central thread of control, and require no centralized communication. They are derived using spectral properties of graphs: graphs of physical network links among computers in the load balancing problem, and graphs of logical communication channels among processes in the mapping problem. A distinguishing characteristic of these algorithms is that they are scalable: the expected cost of execution does not increase with problem scale. This is proven in a scalability theorem which shows that, for several simple disturbance models, the rate of convergence to a solution is independent of scale. This property is extended through simulated examples and informal argument to general and random disturbances. A worst case disturbance is presented and shown to occur with vanishing probability as the problem scale increases. To verify these conclusions the load balancing algorithm is deployed in support of a photo-realistic rendering application on a parallel computer system based on Monte Carlo path tracing. The performance and scaling of this application, and of the dynamic load balancing algorithm, are measured on different numbers of computers. The results are consistent with the predictions of scalability, and the cost of load balancing is seen to be non-increasing for increasing numbers of computers. The quality of load balancing is evaluated and compared with the quality of solutions produced by competing approaches for up to 1,024 computers. This comparison shows that the algorithm presented here is as good as or better than the most popular competing approaches for this application. The thesis then presents the dynamic mapping algorithm, with simulations of a model problem, and suggests that the pair of algorithms presented here may be an ideal complement to more expensive algorithms such as the well-known recursive spectral bisection

    Scalable Load Balancing by Diffusion

    Get PDF
    [No abstract available

    Scalable Interactive Volume Rendering Using Off-the-shelf Components

    Get PDF
    This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators

    A Parabolic Theory of Load Balance

    Get PDF
    We derive analytical results for a dynamic load balancing algorithm modeled by the heat equation ut = V2u. The model is appropriate for quickly diffusing disturbances in a local region of a computational domain without affecting other parts of the domain. The algorithm is useful for problems in computational fluid dynamics which involve moving boundaries and adaptive grids implemented on mesh connected multicomputers. The algorithm preserves task locality and uses only local communication. Resulting load distributions approximate time asymptotic solutions of the heat equation. As a consequence it is possible to predict both the rate of convergence and the quality of the final load distribution. These predictions suggest that a typical imbalance on a multicomputer with over a million processors can be reduced by one order of magnitude after 105 arithmetic operations at each processor. For large n the time complexity to reduce the expected imbalance is effectively independent of n

    Parallel graphics and visualization

    Get PDF
    Computer graphics and visualization are very active fields of Computer Science, continuously producing new and exciting results. However, the demand for increasingly faster feedback together with the huge volume of data usually associated with these applications, result on growing computational requirements. An efficient utilization of a multiplicity of computational and visualization resources expedites data processing for image generation, thus enabling such requirements to be met. This special issue of Parallel Computing attends to a selection of six papers out of 21 published at the past 2006 Eurographics Symposium on Parallel Graphics and Visualization, which was held in May 2006 in Braga, Portugal. The Eurographics Symposium on Parallel Graphics and Visualization focuses on theoretical and applied research issues critical to parallel and distributed computing and its application to all aspects of computer graphics, virtual reality, scientific and engineering visualization. Parallel graphics and visualization has evolved dramatically in the last few years. While previous works focused on SIMD architectures and standard PC clusters, more recent research moved to large displays and visualization oriented cluster architectures, which include graphics processing units at each node. This trend can be observed on the papers selected for this special issue: two papers present results on realistic rendering on PC clusters, two papers focus on parallel volume rendering resorting to graphics processing units and two papers address large displays and visualization clusters. The paper by Chalmers et al. combines parallel processing on a cluster with visual perception to achieve high fidelity physically based selective rendering at close to interactive rates. Thomaszewski et al. also use a PC cluster to perform physically based simulations of cloth, modelling both the material properties and the interaction with the surrounding scene. Bernardon et al. exploit CPU and GPU parallelism to render volumes of unstructured grids with time varying data. Other volume rendering technique is presented by Müller et al. using a sort last approach to perform volume ray casting on the fragment shaders of a GPU cluster. Cotting et al. present a software genlock approach for Windows, compatible with off-the-shelf graphics hardware, which can be employed to build cost effective VR installations such as large tiled displays. Lorenz and Brunnett add a new functionality to Chromium, where a new point-to-multipoint connection based on UDP allows rendering of large scenes synchronously on an arbitrary number of tiled displays at nearby constant performance. We hope that this special issue provides an interesting overview into parallel graphics and visualization. Further interest in the topic can be satisfied by following the Symposia on Parallel Graphics and Visualization, the 2007 one taking place in Lugano, Switzerland

    M. Meissner, B.- O. Schneider (Editors) Optimal Automatic Multi-pass Shader Partitioning by Dynamic Programming

    No full text
    Complex shaders must be partitioned into multiple passes to execute on GPUs with limited hardware resources. Automatic partitioning gives rise to an NP-hard scheduling problem that can be solved by any number of established techniques. One such technique, Dynamic Programming (DP), is commonly used for instruction scheduling and register allocation in the code generation phase of compilers. Since automatic partitioning occurs during the shader compilation process it is natural to ask whether DP is useful for shader partitioning as well as for code generation. This paper demonstrates that these problems are Markovian and can be solved by DP techniques. It presents a DP algorithm for shader partitioning that can be adapted for use with any GPU architecture. Unlike solutions produced by other techniques DP solutions are globally optimal. Experimental results on a set of test cases with a commercial prerelease compiler for a popular high level shading language showed a DP algorithm had an average runtime cost of O(n 1.14966) which is less than O(nlogn) on the region of interest in n. This demonstrates that efficient and optimal automatic shader partitioning can be an emergent byproduct of a DP-based code generator for a very high performance GPU

    Scalable Distributed Visualization Using Off-the-Shelf Components

    No full text
    This paper describes a visualization architecture for scalable computer systems. The architecture is currently being prototyped for use in Beowulf-class clustered systems. A set of OpenGL frame buffers are driven in parallel by a set of CPUs. The visualization architecture merges the contents of these frame buffers by userprogrammable associative and commutative combining operations. The system hardware is built from off-the-shelf components including OpenGL accelerators, Field Programmable Gate Arrays (FPGAs) , and gigabit network interfaces and switches. A secondgeneration prototype supports 60 Hz operation at 1024 # 1024 pixel resolution with interactive latency up to 1000 nodes. CR Categories: B.7.1 [Integrated circuits]: Types and design styles---Gate arrays; C.2.5 [Computer-communication networks]: Local and wide-area networks---High-speed; D.1.3 [Programming techniques]: Concurrent programming---Parallel programming; I.3.1 [Computer graphics]: Hardware architecture---Parallel processing; I.3.2 [Computer graphics]: Graphics systems--- Distributed/network graphics Keywords: FPGA, OpenGL, visualization, cluster, Beowulf, gigabit, fat-tree

    A Parabolic Load Balancing Method

    Get PDF
    This paper presents a diffusive load balancing method for scalable multicomputers. In contrast to other schemes which are provably correct the method scales to large numbers of processors with no increase in run time. In contrast to other schemes which are scalable the method is provably correct and the paper analyzes the rate of convergence. To control aggregate cpu idle time it can be useful to balance the load to specifiable accuracy. The method achieves arbitrary accuracy by proper consideration of numerical error and stability. This paper presents the method, proves correctness, convergence and scalability, and simulates applications to generic problems in computational fluid dynamics (CFD). The applications reveal some useful properties. The method can preserve adjacency relationships among elements of an adapting computational domain. This makes it useful for partitioning unstructured computational grids in concurrent computations. The method can execute asynchronously to balanc..

    Scalable photo-realistic rendering of complex scenes

    No full text
    . Photorealistic rendering of complex scenes poses computational demands as great as those of any large scale scientific or engineering calculation. Just as scientific calculations have benefited from access to scalable computing systems so too can photorealistic rendering. This paper describes an application of scalable parallel processors to photorealistic rendering of complex scenes by Monte Carlo path tracing. The application uses scalable implementation methods in order to achieve good performance on large numbers of computers and on models which require large amounts of data. The implementation is a message driven concurrent pipeline which employs a diffusion algorithm for dynamic load balancing. The application can be extended to partition extremely large models across physically distributed memory as well as to perform out-of-core calculations. 1 Introduction In recent years scalable parallel processors (SPPs) have become readily available to solve computationally intensive pr..
    corecore